<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>使用语言分析器 | Elasticsearch: 权威指南 | Elastic</title>
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
</head>
<body>
<div class="main-container">
    <section id="content">
        
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原文地址: <a href="https://www.elastic.co/guide/cn/elasticsearch/guide/current/using-language-analyzers.html" rel="nofollow">https://www.elastic.co/guide/cn/elasticsearch/guide/current/using-language-analyzers.html</a>, 版权归 www.elastic.co 所有<br/>
                            英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html" rel="nofollow">https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html</a>
                            </div>
                        <!-- start body -->
                  <div class="page_header">
<b>请注意:</b><br>本书基于 Elasticsearch 2.x 版本，有些内容可能已经过时。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch: 权威指南</a></span>
»
<span class="breadcrumb-link"><a href="languages.html">处理人类语言</a></span>
»
<span class="breadcrumb-link"><a href="language-intro.html">开始处理各种语言</a></span>
»
<span class="breadcrumb-node">使用语言分析器</span>
</div>
<div class="navheader">
<span class="prev">
<a href="language-intro.html">« 开始处理各种语言</a>
</span>
<span class="next">
<a href="configuring-language-analyzers.html">配置语言分析器 »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="using-language-analyzers"></a>使用语言分析器<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/edit/cn/200_Language_intro/10_Using.asciidoc">edit</a>
</h2>
</div></div></div>
<p>Elasticsearch 的内置分析器都是全局可用的，不需要提前配置，它们也可以在字段映射中直接指定在某字段上：</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">PUT /my_index
{
  "mappings": {
    "blog": {
      "properties": {
        "title": {
          "type":     "string",
          "analyzer": "english" <a id="CO124-1"></a><i class="conum" data-value="1"></i>
        }
      }
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO124-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p><code class="literal">title</code> 字段将会用 <code class="literal">english</code>（英语）分析器替换默认的 <code class="literal">standard</code>（标准）分析器</p>
</td>
</tr>
</table>
</div>
<p>当然，文本经过 <code class="literal">english</code> 分析处理，我们会丢失源数据：</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">GET /my_index/_analyze?field=title <a id="CO125-1"></a><i class="conum" data-value="1"></i>
I'm not happy about the foxes</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO125-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>切词为: <code class="literal">i'm</code>，<code class="literal">happi</code>，<code class="literal">about</code>，<code class="literal">fox</code></p>
</td>
</tr>
</table>
</div>
<p>我们无法分辨源文档中是包含单数 <code class="literal">fox</code> 还是复数 <code class="literal">foxes</code> ；单词 <code class="literal">not</code> 因为是停用词所以被移除了，
所以我们无法分辨源文档中是happy about foxes还是not happy about foxes，虽然通过使用 <code class="literal">english</code>
（英语）分析器，使得匹配规则更加宽松，我们也因此提高了召回率，但却降低了精准匹配文档的能力。</p>
<p>为了获得两方面的优势，我们可以使用<a class="xref" href="multi-fields.html" title="字符串排序与多字段">multifields</a>（多字段）对 <code class="literal">title</code> 字段建立两次索引：
一次使用 <code class="literal">english</code>（英语）分析器，另一次使用 <code class="literal">standard</code>（标准）分析器:</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">PUT /my_index
{
  "mappings": {
    "blog": {
      "properties": {
        "title": { <a id="CO126-1"></a><i class="conum" data-value="1"></i>
          "type": "string",
          "fields": {
            "english": { <a id="CO126-2"></a><i class="conum" data-value="2"></i>
              "type":     "string",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO126-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>主 <code class="literal">title</code> 字段使用 <code class="literal">standard</code>（标准）分析器。</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO126-2"><i class="conum" data-value="2"></i></a></p>
</td>
<td align="left" valign="top">
<p><code class="literal">title.english</code> 子字段使用 <code class="literal">english</code>（英语）分析器。</p>
</td>
</tr>
</table>
</div>
<p>替换为该字段映射后，我们可以索引一些测试文档来展示怎么在搜索时使用两个字段：</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">PUT /my_index/blog/1
{ "title": "I'm happy for this fox" }

PUT /my_index/blog/2
{ "title": "I'm not happy about my fox problem" }

GET /_search
{
  "query": {
    "multi_match": {
      "type":     "most_fields", <a id="CO127-1"></a><i class="conum" data-value="1"></i>
      "query":    "not happy foxes",
      "fields": [ "title", "title.english" ]
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO127-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>使用<a class="xref" href="most-fields.html" title="多数字段"><code class="literal">most_fields</code></a> query type（多字段搜索语法来）让我们可以用多个字段来匹配同一段文本。</p>
</td>
</tr>
</table>
</div>
<p>感谢 <code class="literal">title.english</code> 字段的切词，无论我们的文档中是否含有单词 <code class="literal">foxes</code> 都会被搜索到，第二份文档的相关性排行要比第一份高，
因为在 <code class="literal">title</code> 字段中匹配到了单词 <code class="literal">not</code> 。</p>
</div>
<div class="navfooter">
<span class="prev">
<a href="language-intro.html">« 开始处理各种语言</a>
</span>
<span class="next">
<a href="configuring-language-analyzers.html">配置语言分析器 »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>